OpenAI hallucination rate AI News List

OpenAI hallucination rate AI News List | Blockchain.News

AI News List

List of AI News about OpenAI hallucination rate

Time	Details
2026-01-08 11:23	PersonQA Benchmark Reveals Increasing Hallucination Rates in OpenAI Models: o1 vs o3 vs o4-mini According to God of Prompt (@godofprompt), recent results from the PersonQA benchmark demonstrate a concerning trend in OpenAI's large language models. The hallucination rate increased significantly with each new model iteration: OpenAI o1 exhibited a 16% hallucination rate, o3 rose to 33%, and o4-mini reached 48%. These findings suggest that newer versions are not addressing, and may even be amplifying, the issue of factual inaccuracy in AI-generated content. This trend exposes a critical challenge for enterprise AI adoption, as increased hallucinations can undermine trust, limit business applications in sensitive domains, and raise regulatory concerns. Companies deploying OpenAI models should carefully evaluate model performance on domain-specific benchmarks and demand transparency in model updates to mitigate risks. (Source: God of Prompt @godofprompt, Jan 8, 2026) Source

Time

Details

2026-01-08
11:23

PersonQA Benchmark Reveals Increasing Hallucination Rates in OpenAI Models: o1 vs o3 vs o4-mini

According to God of Prompt (@godofprompt), recent results from the PersonQA benchmark demonstrate a concerning trend in OpenAI's large language models. The hallucination rate increased significantly with each new model iteration: OpenAI o1 exhibited a 16% hallucination rate, o3 rose to 33%, and o4-mini reached 48%. These findings suggest that newer versions are not addressing, and may even be amplifying, the issue of factual inaccuracy in AI-generated content. This trend exposes a critical challenge for enterprise AI adoption, as increased hallucinations can undermine trust, limit business applications in sensitive domains, and raise regulatory concerns. Companies deploying OpenAI models should carefully evaluate model performance on domain-specific benchmarks and demand transparency in model updates to mitigate risks. (Source: God of Prompt @godofprompt, Jan 8, 2026)

Source